13. SHAP and Interventions

  • causal graphs and the do-operator
  • marginal SHAP as intervention
  • code examples

Learning outcomes

  1. Compare the competing definitions of interpretable machine learning, the motivations behind them, and metrics that can be used to quantify whether they have been met.
  2. Describe the theoretical foundation of post-hoc explanation methods like SHAP and linear probes values and apply them to realistic case studies with appropriate validation checks

Causal View

Recall

Setup. Algorithm \(f(x_1, x_2) = x_1\) trained on data where \(X_1 \equiv X_2\).

Conditional SHAP attributes non-zero importance to \(X_2\).

Marginal SHAP correctly gives \(\varphi_2 = 0\).

Question. Should we have preferred Marginal SHAP all along?

Causal View of Post-hoc Explanation

The function-to-explain can be viewed as a causal graph.

(draw causal graph with X and \(\tilde{X}\))

This distinguishes between observed data \(\tilde{X}\) and function inputs \(X\).

When we set \(X_1 = x_1\) and let other inputs vary, how does \(Y\) change? It depends on how we vary the other inputs.

Observation vs. Intervention

Observational (seeing). \(\mathbf{E}[Y \mid X_1 = x_1]\)

“Among data with \(X_1 = x_1\), what is average \(Y\)?”

Interventional (doing). \(\mathbf{E}[Y \mid \text{do}(X_1 = x_1)]\)

“If we force \(X_1 = x_1\) for each of this function’s inputs, what is \(Y\)?”

These differ when features are correlated in training data.

Backdoor Criterion

Consider:

       Z
      / \
     ↓   ↓
   X₁ → Y ← X₂

Backdoor formula. \[\mathbf{E}[Y \mid \text{do}(X_1 = x_1)] = \int \mathbf{E}[Y \mid x_1, x_2] p(x_2) \, dx_2\]

The do-operator deletes incoming edges to \(X_1\). Edge \(Z \to X_1\) is severed.

Backdoor Criterion

Contrast this with the formula for usual conditional expectation.

\[\mathbf{E}[Y \mid \text{do}(X_1 = x_1)] = \int \mathbf{E}[Y \mid x_1, x_2] p(x_2) \, dx_2\]

\[\mathbf{E}[Y \mid X_1 = x_1] = \int \mathbf{E}[Y \mid x_1, x_2] p(x_2 \vert x_{1}) \, dx_2\]

Application to Post-hoc Explanation

Algorithm \(f\) has no confounders:

(draw causal graph with X and \(\tilde{X}\))

\[\mathbf{E}[Y \mid \text{do}(X_S = x_S)] = \int f(x_S, x_{\bar{S}}) p(x_{\bar{S}}) \, dx_{\bar{S}}\]

No backdoor paths exist, so no conditioning is needed and this is the marginal expectation.

Interventional View

Algebraic. \[\mathbf{E}[f(x_S, X_{\bar{S}})] = \int f(x_S, x_{\bar{S}}) p(x_{\bar{S}}) \, dx_{\bar{S}}\]

Causal.

Intervention \(\text{do}(X_S = x_S)\) deletes any edges into \(X_S\). Distribution \(p(x_{\bar{S}})\) unchanged.

Computational.

predictions = []
for _ in range(n_samples):
    x_complement = sample_from_training_data()  # Independent of x_S
    predictions.append(f(x_S, x_complement))
return mean(predictions)

Conditional View

Algebraic. \[\mathbf{E}[f(x_S, X_{\bar{S}}) \mid X_S = x_S] = \int f(x_S, x_{\bar{S}}) p(x_{\bar{S}} \mid x_S) \, dx_{\bar{S}}\]

Causal. Asks “what do we see when \(X_S = x_S\)?” not “what happens when we set \(X_S = x_S\)?”

Computational.

predictions = []
for _ in range(n_samples):
    x_complement = sample_conditional_on(x_S)  # Depends on x_S
    predictions.append(f(x_S, x_complement))
return mean(predictions)

Two Causal Graphs

Janzig et al. (2020) argue that we should focus on the “Algorithm” causal structure.

Data generation (real world).

    Z̃ (confounders)
   / \
  ↓   ↓
 X̃₁  X̃₂

How do real-world features relate?

Algorithm (computation).

X₁ → Y = f(X)
X₂ →

How does the prediction function process inputs?

Formal Separation

Janzing et al. (2020): - \(\tilde{X}_j\): real-world feature - \(X_j\): algorithm input

Usually \(X_j = \tilde{X}_j\), but they are distinct objects with distinct causal graphs.

Reality. Cannot change \(\tilde{X}_1\) without affecting \(\tilde{X}_2\) (shared causes).

Algorithm. Can evaluate f(x_1, x_2) for any values.

SHAP explains \(f\), so arguably should use the algorithm’s causal structure.

Binary Data Example

Setup. \(f(x_1, x_2) = x_1\) where \(X_1 \equiv X_2\) in data.

Marginal. \[v^{\text{marg}}(\{2\}) = \int x_1 p(x_1) dx_1 = \mathbf{E}[X_1]\]

Conditional. \[v^{\text{cond}}(\{2\}) = \int x_1 p(x_1 \mid x_2) dx_1 = x_2\]

Conditional approach “sees” correlation. Marginal approach intervenes in the algorithm.

Code Examples

Binary Example

Algorithm: \(\hat{y} = \hat{\alpha}_0 + \hat{\alpha}_1 x_1 + \hat{\alpha}_2 x_2\)

True causal effect of \(X_2\) on \(Y\): zero.

Marginal SHAP

Changing x2 (while sampling x1 independently) does not affect predictions.

Correctly identifies \(X_2\) as irrelevant.

Conditional SHAP

Observing x2 = 1 implies x1 ≈ 1, so x2 appears predictive.

Correlation \(\neq\) causal relevance in the algorithm.

Gaussian Example

True Attribution

Linear models: \(\varphi_j = \alpha_j (x_j - \mathbf{E}[X_j])\)

Since \(\alpha_1 = 0\), we have \(\varphi_1 = 0\) for all observations.

Algorithm does not use \(X_1\).

Marginal SHAP (Gaussian)

Small error due to finite sampling. Correctly near zero.

Conditional SHAP (Gaussian)

Larger error. Attributes importance to \(X_1\) due to correlation with other features.

Sampling

Marginal sampling will extrapolate – we test the algorithm on all possible inputs.

Attributions Histogram

0 250 500 750 1000 -0.4 -0.2 0.0 0.2 SHAP value for X1 count method X1_cond X1_marg Attribution to X1 (α₁ = 0)

Exercise